Classification, Regression, and Feature Selection on Matrix Data

نویسندگان

  • Sepp Hochreiter
  • Klaus Obermayer
چکیده

We describe a new technique for the analysis of data which is given in matrix form. We consider two sets of objects, the “row” and the “column” objects, and we represent these objects by a matrix of numerical values which describe their mutual relationships. We then introduce a new technique, the “Potential Support Vector Machine” (P-SVM), as a large-margin based method for the construction of classifiers and regression functions for the “column” objects. Contrary to standard support vector machine (SVM) approaches, the P-SVMminimizes a scale-invariant capacity measure under a new set of constraints. As a result, the P-SVM can handle data matrices which are neither positive definite nor square, and leads to a usually sparse expansion of the classification boundary or the regression function in terms of the “row” rather than the “column” objects. We introduce two complementary regularization schemes in order to avoid overfitting for noisy data sets. The first scheme improves generalization performance for classification and regression problems, the second scheme leads to the selection of a small and informative set of “row” objects and can be applied to feature selection. A fast optimization algorithm based on the “Sequential Minimal Optimization” (SMO) technique is provided. We first apply the new method to so-called pairwise data, i.e. “row” and “column” objects are from the same set. Pairwise data can be represented in two ways. The first representation uses vectorial data and constructs a Gram matrix from feature vectors using a kernel function. Benchmark results show, that the P-SVM method provides superior classification and regression results and has the additional advantages that kernel functions are no longer restricted to be positive definite. The second representation uses a measured matrix of mutual relations between objects rather than vectorial data. The new classification and regression method performs very well compared to standard techniques on benchmark data sets. More importantly, however, experiments show that the P-SVM can be very effectively used for feature selection. Then we apply the P-SVM to genuine matrix data, where “row” and “column” objects

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Improving Chernoff criterion for classification by using the filled function

Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

Feature Selection Based on Genetic Algorithm in the Diagnosis of Autism Disorder by fMRI

Background: Autism Spectrum Disorder (ASD) occurs based on the continuous deficit in a person’s verbal skills, visual, auditory, touch, and social behavior. Over the last two decades, one of the most important approaches in studying brain functions in autistic persons is using functional Magnetic Resonance Imaging (fMRI). Objectives: It is common to use all brain regions in functional extracti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004